Harnessing the Energy of Robots.txt
Sometimes, we may want search-engines not to catalog certain areas of the site, and on occasion even ban other SE in the site altogether. For different viewpoints, we understand people check-out: get http://www.orangesonline.com/index.cfm.
That is where a simple, little 2-line text file called robots.txt will come in. If you are interested in families, you will seemingly wish to check up about compare orangesonline.com/index.cfm.
Once we've a web site up and running, we need to be sure that all visiting search-engines can access all the pages we want them to look at.
Sometimes, we may want search engines never to list certain parts of the site, and on occasion even prohibit other SE from the site altogether.
This really is where a simple, little 2 line text file called robots.txt will come in.
Robots.txt lives within your sites main directory (on LINUX systems that is your /public_html/ directory), and looks some thing such as the following:
User-agent: *
Disallow:
The very first line controls the robot that'll be visiting your site, the next line controls if they are allowed in, or which elements of the site they're perhaps not allowed to see
Then easy repeat the above lines, If you want to deal with multiple robots.
Therefore an example:
User-agent: googlebot
Disallow:
User-agent: askjeeves
Disallow: /
This will allow Goggle (user-agent name GoogleBot) to see every page and directory, while in the same time banning Ask Jeeves from the site entirely. This majestic found it article has a pile of compelling lessons for the inner workings of it.
To locate a fairly up to date list of software consumer names this visit http://www.robotstxt.org/wc/active/html/index.html
Even though you need to let every software to index every page of your site, its still very advisable to place a robots.txt report in your site. It will stop your problem records filling with records from search-engines trying to access your robots.txt file that doesnt exist.
For more information on robots.txt see, the full set of sources about robots.txt at http://www.websitesecrets101.com/robotstxt-further-reading-resources.